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Abstract 

Many methods have been developed to estimate the set of relevant variables in a sparse linear 
model Y - X/3 + e where the dimension p of /? can be much higher than the length n of Y . Here 
we propose two new methods based on multiple hypotheses testing, either for ordered or non- 
ordered variables. Our procedures are inspired by the testing procedure proposed by Baraud et al. 
(|2[l- The new procedures are proved to be powerful under some conditions on the data and their 
properties are non asymptotic. They gave better results in estimating the set of relevant variables 
than both the False Discovery Rate (FDR) and the Lasso, both in the common case {p < n) and 
in the high-dimensional case {p > n). 
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1. Introduction 

Recent technologies have provided scientists with a new kind of data; very high-dimensional 
data, especially with high-throughput DNA/RNA chips in biology. Unravelling the relevant vari- 
ables -genes for example- underlying an observation is a well known problem in statistics and is 
still one of the current major challenges. Indeed, with a large number of variables there is often 
a desire to select a smaller subset that not only fits as well as the full set of variables, but also 
contains the more important ones. Discovering the relevant variables leads to higher prediction 
accuracy, an important criterion in variable selection. 

Many methods have been developed to estimate the set of relevant variables in the linear 
model Y - X/3 + e where the dimension p of /? can be much higher than the length n of Y . In par- 
ticular, a lot of model selection methods have been developed based on a penahzed criterion. The 
mostly known is probably the Lasso that had been presented by Tibshirani jT^]; penalization 
of the least squares estimate which shrinks to zero some irrelevant coefficients, hence an estima- 
tion of the set of relevant variables. A lot of studies have been conducted on the Lasso and many 
results are available; e.g. consistency of the Lasso in high-dimensional linear regression llT6ll . 
sparsity oracle inequalities |5] and variable selection in high-dimensional graphs with the Lasso 



1 1211 . The Lasso has several variants such as an adaptative Lasso [9], a bootstrap Lasso [1] or a 
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Group Lasso ['s']. A /' penalization has also been used in the Sparse-PLS, which induces a lim- 
ited number of variables in each PLS direction; see Tenenhaus [13J for an introduction on PLS, 
and Le Cao et al. lITlll for further details on Sparse-PLS. Other kinds of penalization have also 
been used, such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion 
(BIC), two methods based on a likelihood penalization on the number of variables included in 
the model. Despite that the major portion of model selection methods was developed to perform 
in low dimension, some of them apply in the high-dimensional case. There is still some others 
that were actually developed to be powerful when p is higher than n, such as the Dantzig selector 
Jt!]. Nonetheless a recent paper shows that under a sparsity condition on the linear model, the 
Dantzig selector and the Lasso exhibit similar behavior |4]. Nevertheless, penalization criterion 
is not the only way to perform model selection. For instance, the False Discovery Rate (FDR) 
procedure, developed in the context of multiple hypotheses testing by Benjamini and Hochberg 
DSD, was used in variable selection by Bunea et al. |6]. This procedure has been extended to 
high-dimensional analysis and is presently used in biology for QTL research and transcriptome 
analysis; a p-value is calculated for each variable X,- from the regression of Y onto that variable 
and selection is performed through an adjusted threshold. 

Most of the selection methods cited above give quite good results when p is lower than n, 
but the results get worst as p grows larger than n. In the context of this paper, variable selection 
when p is much higher than n, those methods are disappointing and unsatisfactory. Moreover, 
most of theoretical results are only asymptotic and only prove consistency of the estimators. Non 
asymptotic results are more sought since small samples are usual in practice. Concerning meth- 
ods using a penalized criterion such as AIC, BIC or any other penalization on the likelihood, 
another major drawback is of computational nature. Indeed, a search through all the 2^ possible 
spaces may be needed and this search is as complex as p grows. 

This paper deals with the problem of recovering the set of relevant variables in a sparse hnear 
model when p can be lower or far higher than n. We consider the regression model: 

Y^Xp+e (1) 

where Y is the observation of length n, X - {X\, .■.,Xp) is the matrix of p variables, /5 is an 
unknown vector of R'', e a Gaussian vector with i.i.d. components, e ~ N„(Q, cr^In) where /„ is 
the identity matrix of M", and cr some unknown positive quantity. We set J - + 0) and 

\J\ - ko. We denote /3j - (Pi)jej- Let /j - E(Y) - Xp and P^, the distribution of Y obeying to 
model ([TJ- 

The aim of this paper is to estimate J, the set of relevant variables in ([T]i. We distinguish 
two frameworks. On one hand, the variables X\, ...,Xp are assumed to be ordered, regardless of 
Y. We define a powerful procedure for estimating J under some conditions on the data, either 
when p < n or when p > n. These properties are non asymptotic. This procedure is a multiple 
hypotheses testing method based on Baraud et al. |2] which consists of doing several tests to 
decide whether E{Y) is in V, some linear subspace of R", or in a suitable collection of subspaces 
containing V. On the other hand, the variables are not assumed to be ordered. We provide a 
procedure to estimate J when cr is known and another similar procedure when cr is unknown. 
The two procedures are proved to be powerful under some conditions on the data. The properties 
of the procedures are also non asymptotic. 

This paper is organized as follow, in Section |2] we present the first procedure to estimate J, 
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in the ordered variable case; the non-ordered variable selection is considered in Section [3] A 
simulation study is provided in Section|4]to compare several variable selection methods. 



2. Ordered variable selection 

2.1. The case p < n 

First of all, the common case p < n is considered. The family (X,)i<,<p is assumed to be 
a linearly independent family to which an order is given, regardless of Y. In this section we 
focus on ordered variables, which means that the relevant variables are supposed to be in the first 
places, i.e. J - {I, .., ko}. Hence an estimation of ko gives us an estimation of J. This section 
focuses on the estimation of ^o- 

Let khe a positive integer and let Vi< denote a linear subspace of M", the following results are 
based on a test of the null hypothesis "yU - E(Y) belongs to Vk' against the alternative that it does 
not. As proposed in Baraud et al. |2], we consider a finite collection of linear subspaces of V^, 
{S ic^r, t e 7~), to test the null hypothesis. The index set T is allowed to depend on the number of 
observations n, or on the number of parameters p. 

Let [a,, t € T] be a suitable collection of numbers in ]0,1[. The testing procedure presented in 
Baraud et al. ||2l] consists in doing several Fisher tests of level a, of the null hypothesis: 



The null hypothesis is rejected if at least one of the Fisher tests does. Our procedure consists in 
doing successively the tests {Hk)k>o for a suitable collection of linear subspaces {yk)k>{) until the 
null hypothesis is accepted. 

Let us introduce some notations that will be used throughout this section. Note ||s||^ - 
2/=i ■^?/"- Po'' ^^'^h A- € N, f e 7", we set Vk,i = Vk® Sk.t, and denote by Dk,i and A^^^, the 
dimension of Sk,t and V^^ respectively. Moreover set Yly the orthogonal projector onto V for all 
subspace V. Fo f^iu) denotes the probability for a Fisher with D and degrees of freedom to be 
larger than u. We denote V {x,y) € M"' < x,y >„- xiyj/n, and Va e M, [a] the integer part 
of a. 

For all ! e {1, p], X, is supposed to be normed to 1: V < X,, X, >„= 1. 
As the family (X,)i<,<p is ordered, a natural choice of the collection Vk is the following: set 
VI < /t < p,Vk = span(Xi,..,Xt) and Vo = {0). With this choice of Vk and as (X,) 

i<i<p 1^ a 

linearly independent family, we have for all k >Q, dim(Vk) - k. 

For k € {0, p-l}, let tf„„^ = llog2(p - k)} and = {Sk,t, t e {0, .., tf„„J} be a collection 

of hnear subspace of V^, where: 



With this collection, Dk,t = 2' and Nk,t = n - (k + 2'). 

As mentioned before, our procedure consists in doing successively the tests (Hk)k>o until the 
null hypothesis is accepted; with this choice of the collection of linear subspaces (Vk)Q<k<p and 
(Qk,t„„„)o<k<p, an estimation of ko with our procedure is ^ = inf{A: > Q,Hk is accepted). The 
estimated set of relevant variables is then J - {!,.., k}. 

A procedure to test the null hypothesis Hk is introduced in the following. Set: Va e]0, 1 [, Vfc < p, 



Hk : 



{p e Vk} against the alternative {p e Vk + Sk,t}- 



^te{0,..,i,J^Tk, 



span(Xi+i,..,Xt+2') ny^-". 



(2) 




(3) 
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where {a,, t e 71) is a collection of number in ]0,1[ such that: 

e Vt, f^{Tk,a > 0) < a 



(4) 



The null hypothesis Hk is rejected when „ is positive. 

We choose the collection [a,, t € Tk) in accordance with one of the two following procedures: 
PI. For all t eTic,at - a„ where a„ is the ff-quantile of the random variable 

, N,Ms„e\\l 

inf FDt„Nt, 



P2. The a,'s satisfy the inequality 



teT 



Procedure PI gives a test //^ of size a whereas procedure P2 gives a test Hk of level a. The 
final multiple testing procedure, which consists in calculating successively „ from k - Q until 
Q, is negative, is proved to be powerful; an upper bound of the probability to wrongly estimate 
ko is given in the following theorem. For k = Q, ...,p - 1, for y e]0, 1[ and for all t e Tk, let 
L, = log(l/Q;,),L = log(2/y),m, = 2exp(4L,/A^i.,), and for m > Olet 

/ u u 

K,(u) = 1 + 2 , / + Init , 

V Nk,t Nk,, 



Ci(k,t) = 2.5(1 +K,(L,)Vm,) 



Dk,t + L, 
Nk,, ' 



C2(k,t) = 2.5^l+Kf{L) 
(m,K,(L) 



1 + 



Nk,, 



C3(k,t) = 2.5 



v5 



1 +2 



Dk^ 
Nk.,)' 



Theorem 2.1. Let Y obey to model ([T]i. We assume that p < n and that (Xi)\<i<p is a linearly 
independent family. We denote by J the set 0) = { 1, k^}. Let j and a be fixed in ]0, 1 [ . 

The testing procedure estimates ko by k — inf{k > 0, Tk,a < 0), where Tk,a is defined by ((Sj. Let 
{a,, t e 7~k} be calculated according to the procedure PI or P2. 
If Vk < ko — I the condition (Rk) holds 



(Rk):3teTkl ||ns,,(//)||' > Ci(fe, f) IK^ (//)|f + — 

II ' iin II kj ||„ ^ 

then 



2ko 



C2(k,t)^2' log\-^ ] + C3(k,t)log 
'fi(k ^ko) <y + a 



l2ko\ 
[atyj 



(5) 
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Remark 2.2. For k fixed, Ci{k, f), C2{k, f), C-iik, f) behave like constants if the following condi- 
tions are verified: 

for all f e 71, a, > exp(-A?i,,/10), y > 2exp(-A^A:_,/21) and the ratio 



bounded. 

Under these conditions, the following inequalities hold: 



Nk,, 



remains 



Ci(/t,f) < 10 



Dkj + \og{\la,) 
Nk,, 



C2(k,t)<5 



1 + 



Dk, 



"iNk^j 



Ci{k,t) < 12.5 1 + 2 



'Nk,, 



It is important to note that Theorem l2.1l is non asymptotic, hence the strength of our proce- 
dure. Baraud et al. f2] proposed an adaptive procedure to test in model ([T]) that = Xj3 belongs to 
some linear subspace V of M". Our variable selection procedure as well as the results of Theorem 
12. H are inspired by this paper. An asymptotic property can be deduced to proved the consistency 
of our estimator koikQ-. 

Corollary 2.3. Assume that p < An with A < \. Then, using the same notation as in Theorem 
12.71 we obtain 

P^(k + ko) — > 0. (6) 

Remark 2.4. We say that /u satisfies condition (R) if VA: < A;o - 1, (Rk) holds. According to 
Theorem l2.1l our procedure is powerful under the condition (R). A condition on the coefficients 
/3j underlies in {R) since the projection of Y onto a space spanned by a subset of the family 
i^i)i<i<p depends both on /3 and on the matrix X. These conditions on /3j appear explicitly when 
(Xi)]_<i<p is an orthonormal family. Assume in the following that (X,)i<,<y, is an orthonormal 
family. Thus ([TJ becomes: 

Y ^ Xi/3i + + XkPk + X^+iA+i + ... + XpPp +e. (7) 

With the new decomposition (|7]l, the projection of Y on any subspace S k,, only depends on 
(f^j)j>k+i- Thus the condition (Rk) can be written in a different form, making explicit use of the 
^'s: " 

P 2 

3teTk/pl, + ..+l3l,,> Ciikj) y /3] + — 

j=k+2'+l 

2.2. The case p > n 

After pointing out the properties of the procedure in the common case p < n, let us discuss a 
really important framework: the high-dimensional case, i.e. p > n. The family (X,)i<,<p can no 
longer be a linearly independent family. We have to make the assumption that the decomposition 
of p is unique, i.e3lJ c {1, p]/fi - Yjjej^jPj- 

Let recall that in this section (X,)i<,<p is supposed to be ordered regardless of Y and that the 
relevant variables are supposed to be in the first places, i.e. we still have J - {l,...,ki)} and 
|/| = ^0. 
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C2(k,t) 




+ C,(k,t)logi^] 



Let define a = dim(span(Xi , ...,Xp)), note that a < n. In this part, V ^ < a - 1, 1^^^ - 
\log2{o. - k - 1)J, this is in order to always have A^^., it and so to be able to calculate 
Tk^a defined by Q for all < a - I. Denote Vk = span(Xi, ■.■,XsJ where is defined by 
Sit = inf{5/dim(span(Xi, - k} and S k,t = span(Xi,,+i, n where qit , is defined 

by qu = inf{^/dim(span(X,,,+i, = 2']. 

The condition (R/t) in Theorem l2. ll gives no restriction on the growth of p. Thus Theorem l2.1l 
applies with the new notations for any p > n, but for fco < «■ This is no strong restriction since 
k^) > n means that we do not have sufficient observations to estimate kQ, whatever the method. 

Results from a simulation study in Section |4] will show the power of our procedure; either 
when p < n or when p >n. 

3. Non-ordered variable selection 

In Section|2]we defined a procedure based on multiple hypotheses testing in order to estimate 
J, the set of relevant variables of a sparse linear model ([TJ. As the family (X,)i<,<p was given 
an order independent of Y, the estimation of 7 = {1, ..,^o) was reduced to the estimation of ^o- 
This present section is dedicated to a more common case: (X,)i<,<p is not assumed to be given 
an order anymore, so J is not necessarily equal to {1, ..,A;o). We define here a general two-step 
procedure to estimate 7; the first step orders the variables and the second estimates After the 
first step of the general procedure, the ordered variables will be denoted as X(p). 

The first step of our procedure consists in ordering the variables. In this paper two ways to 
order (X,)i<,<;, taking into account the observations Y are proposed: 

- Variables ordered by increasing p-values: when p < n,a p-value is calculated for each variable 
using the least squares estimate and then the variables are sorted by increasing p-value. When 
p > n,a p-value is calculated for each variable using the decomposition of Y onto that variable. 

- Variables ordered with the Bolasso technique, introduced by Bach Jjl. It is a bootstrapped 
version of the Lasso which improves its stability: several independent bootstrap samples are 
generated and the Lasso is performed on each of them. This approach is proved to make the 
irrelevant variables asymptotically disappear A modification is applied to the Bolasso to adapt 
it to non asymptotic analysis. An appearance frequency is calculated for each variable X, by 
counting the number of times the variable X, is selected over the bootstrap samples. A high 
frequency denotes a good prediction ability of the variable Xj, at a given penalty. To avoid the 
use of a penalty, we set the first ordered variable to be the first one to reach a frequency of 1 from 
a decreasing penalty; and so on for the other variables. We proceed by dichotomy to order the 
variables. 

The first method -ordered p-values- is the one demanding less computational time, but as 
shown in SectionH) the Bolasso technique gives a better order and thus better results. Indeed, as 
we will see throughout this section, the crucial step is the first one; the correct ordering of the 
variables. Indeed, the ability to estimate J with this procedure depends on the ability to get the 
relevant variables in the first places. 

From now on, we assume that the decomposition of jj is unique, i.e 3\J c {1, ...,p}/fj. = 
TijejXjf^j- We have |7| - k^. We introduce here an event that will be useful in the following of 



6 



this section: 



Ak - {the k first ordered variables are relevant) = {{(1), .., (k)} c J] . (8) 

On the event Ai„, the fco first ordered variables are relevant, so {(1), .., (^o)) - J', an estimation of 
/ is then obtained from an estimation of |/| = ko. 

The second step of the general procedure consists in testing successively the null hypothesis: 

Hk : {\J\ = k] against the alternative that > k}. (9) 

The procedure stops when the null hypothesis is accepted; k = inf{^ > 0; Hk is accepted) is an 
estimation of ko with our procedure and therefore J = {(1), ..,{k)] is an estimation of the set of 
relevant variables. 

Two cases are distinguished to test the null hypothesis Hk, either cr is known or not. The 
first step remains the same for both procedures. A procedure 'A' is proved powerful under some 
conditions on the data if cr is known; if cr is unknown, we provide another two-step procedure 
'B' that is also proved to be powerful under some conditions on the data. 

3.1. The case p < n and cr is known 

In this section, we define a procedure called Procedure 'A' under the assumption that the 
variance cr^ is known. Assume that the family (X,)i<,<p is a linearly independent family and that 
the first step of Procedure 'A' has already been done; variables have been ordered. The second 
step is a testing procedure that will be described in the following. 

Let us adapt the notation of Section lTTI to this section: we first recall that 'i k < p - 1, f^o j - 
llogiip - k)i, Tk = {0, f*,„), we define V^k) = span(X(i), ...,%)), let Q^k),tL, = {S(km, ' 6 Tk] 
be a collection of linear subspaces of V^^y where V f e Tk, S(k),{t) - span(X(t+i), ..,X(k+2')) n V^y 
With the definition of 5 we have dim(5(jt)_(,)) = Dkj - 2'. Let us denote y(t),(,) - V(k)®S (kut)- 
Some notations of the previous section will also be used. 

3.1.1. The general case 

We introduce new statistics; for allQ < k < p and for all t eTk, let 

lin.,,„/ll„^ 
Uk,t ^ — ■ 

The second step of the procedure 'A' presented here consists in doing successively several 
tests of the null hypothesis Hk defined by (|9]l at level a,, where {a,, t € Tk) is a suitable collection 
of number in ]0, 1[, using the test statistics Uk,f The final multiple testing procedure consists in 
rejecting Hk if one of those tests rejects this hypothesis. We want the final test to be of level a. 

As the distribution of the statistic Uk.t is unknown, an upper bound has to be found in order 
to build the same kind of testing procedure as in Section|2l indeed the space S (k),(i) is random and 
depends on Y. 

Let k e {0, p - I}, e' ~ N„{0, cP-In)- The family (X,)i<,<p is ordered by a permutation o-\ 
defined by: 

V y 6 {1, ...,k],crx(i) - {]) and V y 6 + 1, ...,p],Xa■^(j) is the variable that maximizes: 
{||nx,n<z..,,,,...,x.,,,_,>^e'll„^V/e{l,...,p)\{cri(l),...,criO--l))). 
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We can then calculate the statistics : 

iin f^w^ 

^k,t = ^3 ' ■^here S(k).o-M = span(X^,(i+i), ..,X^,(i+2')) n V^-^j. 

Lemma 3.1. We have a stochastic upper bound ofUkjfor all < k < p and for all t e Tt-' 
under given by ^ and on the event At defined by (jSj ; 

Uk, < Ul r (10) 

Let Uj. ,(m) denote the probability for the statistic , to be larger than u. 
SetVaeld, l[,VO<;t<p, 

Ml^^ sup {Ukj-U[,'\a,)] (11) 

ten ^ ' ' 

where {a,, f e 71:) is a collection of number in ]0, 1 [ chosen in accordance to the following pro- 
cedure: 



P3. For all t e Tk,ai - Un where a„ is the a-quantile of the random variable 




The null hypothesis //^ is rejected when Mj^ ^ is positive. In fact, the second step of the procedure 
'A' is to calculate Ml from k = Q until Mj is negative. The calculation of the collection 

{at, t eTk) with the procedure P3 gives a test Hk of level a. 

In summary, the two-step procedure 'A' when cr is known consists in ordering the p variables 
and then estimating 7 by 7 = {(1), .., (^a)) where k^ - infjfc > 0; ^ < 0). The testing procedure 
'A' is proved to be powerful and we give an upper bound of the probability to wrongly estimate 
J in the next theorem. 

Theorem 3.2. Let Y obey to model (|IJ. We assume that p < n and that (X,)i<,<p is a linearly 
independent family. We denote by J the set {j,J3j + 0). Let a and y be fixed in ]0, 1 [. 
The procedure estimates J by J — {(1), .., (^a)} where kA — inf{k > 0, ^ < 0), where M| ^ is 
defined by (II 11 1 and {a,, t eTk) is calculated according to the procedure P3. 



We consider the condition (R2,k) stated as (R2,k) ■ 3f ^ logiiko — k) such that 

2' 



\inf^\nstAtS e B2] 



20-2 



10 + 4log 



ip - k)ko 
22' 



+ 



2'+Uog 



ko\n 

ya 



+ log 



' ko\n\ 



where Vt/ < fco, Z?,/ — {span(Xi), I <Z J, \I\ — d} and \Tk\ — log2(p — k) + I. 



If Vk < ko — I the condition (R2,k) holds, then 

Ff,(Ji^J)<y + a + 6 (12) 
where 5 = P^(Ay = P^{3 (j) < feo/Ai) = 0). 

This theorem is non asymptotic and shows that a crucial step is to correctly order the vari- 
ables. Indeed, 6 stands for the weight of the chosen order, if the ko relevant variables are not the 
first ones in the first step of the procedure, then we will not have J - J. 
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3.1.2. The particular case where {Xi)i is an orthonormal family 

When the family (X,)i<,<p is orthonormal, the upper bound of the statistics Ukj in Lemma ITT 
can be expressed differently. 

Let D > and Wi, Wq be D i.i.d. standard Gaussian variables ordered as > ... > 
We define: Vt/ = 



d 



>=I 

Let Zd oiu) denote the probability for the statistic Zj o to be larger than u. 

Lemma 3.3. We have a stochastic upper bound of Uk,tfor allO < k < p and for all t e 7~k: 
under Hk and on the event A^: 

Uk,i < ZDt,,p-k/n. 

Set \/a e]0, 1[, \/0 < k < p, 

Mk,a = sup [Uk,, - Z^l, p^ki"t)lA (14) 
teTt 

where {or,, f e 71) is a collection of number in ]0,1[ chosen in accordance to the following pro- 
cedure: 

P4. For all t eTk,a, - an where a„ is the or-quantile of the random variable 

The null hypothesis Hk is rejected when Mt,„ is positive. The procedure P4 gives a test Hi^ of 
level a. The major benefit of Procedure 'A' when the family (X,)i<,<p is orthonormal is that 
the upper bound of the statistics Uk^, in Lemma [33] does not depend on the family (X,)i<,<p nor 
on the order on that family. Thus the calculation of P4 only depends on k and t, with p and n fixed. 

We have the next corollary in the particular case where (X,)i<,<^ is an orthonormal family, 
making explicit use of the yS's. 

Corollary 3.4. Let Y obey to model ([T]|. We assume that p < n and that (X,)i</<p is an orthonor- 
mal family. We denote by J the set + 0). Let a and y be fixed in ]0, 1[. 
The procedure estimates J by J — {(1), .., (^Ai/i)) where kAhis — inf{k > 0, M^^a ^ 0), where Mk,a 
is defined by (I14l l and [a,, t e7~k} is calculated according to the procedure P4. 



We consider the condition (R2bis.k) stated as (R2bis,k) : 3 ? < logiik^ — k) such that 

2 

2o-2 



ko\n 

ya 



where cr2 is defined by ^t^.d)! ^ ■■■ ^ \lia-2{ka)\ and \Tk\ - log2(p - k) + I. 

If Vk < ko — I the condition (R2bis,k) holds, then 

Pf,(Ji^J)<y + a + 5 (15) 
where 6 = P^(Al) = P^(3 (j) < ko/Pu) = 0). 



3.2. The case p < n and cr is unknown 

In this section, we define a procedure 'B' under the assumption that the variance cr^ is un- 
known. Assume that the family (X,)i<,<p is a Unearly independent family and that the first step 
of this procedure 'B' has already been done; variables have been ordered. In this section, some 
notations of Section [TT] are used: V ^ < p,t'^ax - \Jog2{p - Tu - {0, fJ^ J, we define 
V[k) - span(X(i), let Q^i^^ ,i = {S (k),(t), f G 77) be a collection of linear subspaces of V^y 
where S teTk, S(k),(t) = span(X(i+i), ..,X^k+2')) n V^^. 

We denote V(i:),(/) = V(/t) © S(ii),(i)- With the definition of S(k),(t), we have dim(5 (<:)_(,)) = D^,, - 2' 
and dim(y(-^j_(,p = ^V^,, ^n-(k + 2'). 

3.2.1. The general case 

Set the following statistics; for al\0 < k < p and for all t e Tk, 

Nk,r\\ns,,J\\l 

D,,,||F-n,,,„„F||2- 

The second step of the procedure 'B' presented here consists in doing successively several tests 
of the null hypothesis Hk defined by (|9]l at level a,, suitable collection of number in ]0, 1 [, using 
the test statistics ODt,,Ntr '^he previous Section [3. 1.1 1 an upper bound of UDt,,Nk, needs to 

be found. 



Let k € {0, p - I], e' ~ N„{0, cr I„). The family (X/)i<,<p is ordered by a permutation (T\ 
defined by V j e {1, A;), criij) - {]) and V j e {k+\, ...,p],Xa-^(j) is the variable that maximizes: 

{||nx,n<x.,,,,,...,x.,,,_,,>-e1|2,V/e{l,...,/9)\{(ri(l^ 

A^Avl|ns,,,,„„e'||2 

We can then calculate the statistics T^,, = 



where S(k),ai(t) = span(X^,(^+i), ..,X^^(k+2')) H V^^-^ and V^k),a■,(t) = S^k),a,{t) © ^W- 

Lemma 3.5. We have a stochastic upper bound of UDt,,Nn for all < k < p and for all t e 7~k- 
under Hk given by (|9| and on the event Ak defined by ^ : 



Ud,,m, < T,,,. (16) 

Let Tk,iiu) denote the probability for the statistic , to be larger than u. 
SetVaeJO, l[,VO<;t<p, 

Mk,a = sup [Ud,m, - ;(«,)} (17) 

ten ' 

where {a,, t e Tk] is a collection of number in ]0,1[ chosen in accordance to the following pro- 
cedure: 



P5. For all t e Tk,ar - a„ where a„ is the a-quantile of the random variable 

inf T,,,{T,,,), 

ten 

The null hypothesis Hk is rejected when Mk^a is positive. In fact, the second step of the procedure 
'B' is to calculate Mk,a from ^ = until M^ ,, is negative. The calculation of the collection 
{a,, t eTk) with procedure P5 gives a final test Hk of level a. 
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In summary, this two-step procedure 'B' when cr is unknown consists in ordering the p 
variables and then estimating |7| hy J - {(1), .., (fcg)) where liB - inf{fc > Q;Mk,a ^ 0). The 
procedure is proved to be powerful in the next theorem; we give an upper bound of the probability 
to wrongly estimate J under some conditions on the data. Let us introduce some notations that 
will be used in the following theorem: 



L, = \og{\Tk\la), m, = exp(4L,/A^^,,), mp = exp 



Nk,, 



■log 



Dk,, 



, M = 2m,mp. Denote 



Dk,, 



Nkj 



Dk,, 



Ai(k, f) = , / 1 + — , A2(k, 0=1+ 2— M and Aiik, t) = 2Ai(fe, f) + A2(k, t) 



Nk,, 



Theorem 3.6. Let Y obey to model ([TJ. We assume that p < n and that (X,)i<,<p is a linearly 
independent family. We define by J the set {j,/3j + 0). Let a and y be fixed in ]0, 1 [. 
The procedure estimates J by J - {(1), .., (^b)) where kg — infik > 0, Mk,a ^ 0), where Mk,a is 
defined by and {a,, t &Tk] is calculated according to the procedure P5. 

We consider the condition {Rj,,k) stated as (R3,k) '■ 3f < log2(kQ — k) such that 
Unf^\nsti\tS e B2] > 



i + cr^\2 + -log\^-!^ 



Aik, t) 
Nk,, 



where A(k, t) - 2' 
and Vd < ko, Bj — {span(Xi), I <Z J, 



2 + — + A3(^, t)log 

Nk,, 



eip - k) 



log2{p -k) + V 



(18) 



+ {l+A2ik,t))log 



d]. 



If Vk < ko — I the condition (^3,^) holds, then 

Pp(J i^J)<y + a + 6 
where 6 = P/A^ = Pp(3 (J) < ^o/A;) = 0). 



(19) 



This theorem is non asymptotic and shows that under some conditions on the data, the testing 
procedure 'B' presented in this section is powerful. As for Theorem l3.2l of Section [3.1.1l the first 
step of the procedure -the ordering of the variables- has an important part in Theorem l3.6l 



Remark 3.7. The condition (^3,^:) can be simplified under the assumption that 2' < (n- k)/2 and 
logip - k) > I. Indeed, in this case, the right hand (fTST l is upper bounded by 



Cm„,y,a,(r)2' 



logip ~ ^) ^ log(ko) 



Nk,, 



(20) 



where C(||yu||„, y, a, cr) is a constant depending on ||ju||„, y, a and cr. 



A simulation study in Section |4] will show that this testing procedure combined with a good 
way to order variables -in order to minimize S- performs well. 
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3.2.2. The particular case where (X^,)i</<;, is an orthonormal family 

When iXi)i<i<p is an orthonormal family, the condition (Ri^k) of Theorem 13.61 can be ex- 
pressed differently, making explicit use of the jS's. The new condition obtained in the case of an 
orthonormal family is also easier to satisfy. 

Corollary 3.8. Let Y obey to model ([TJ. We assume that p < n and that (X,)i<,<y, is an orthonor- 
mal family. We define J by the set + 0). Let a and j be fixed in ]0, 1 [. 
The procedure estimates J by J - {(1), where kg — infik > 0,Mk,a ^ 0), where Mk,a is 

defined by (I17l l and {a,,t eTk) is calculated according to the procedure P5. 

We consider the condition {R3bis,k) stated as (Ribis.k) '■ 3f ^ logiiko - k) such that 

— y^' /?2 > 

2o-2 ^h^Po-.u) - 



A{k, t) 

Nk,r 



where A(k, t) - 2' 



2' 

2 + — + A3(^, t)log 



y 

e(p - k) 



+ (1 +K2{k,t))log 



logiip -k) + l 



a 



and (T2 is defined such that |y6a-,(i)| < ... < \/3o-^_(ko)\- 
If Vk < ko — I the condition (R3bis,k) holds, then 

Vf,iJ i^J)<y + a + 5 

where 6 = P^(A^^) = P^(3 (j) < feo/A;) = 0). 

Remark lTTl is also verified in the particular case where (X,)i<,<p is an orthonormal family. 



(21) 



3.3. The case p > n 

We will now discuss the high-dimensional case with non-ordered variables, p > n. This 
section fits the two-step procedures previously introduced to high-dimensional analysis. Verzelen 
lITsIl shows that when kolog(ep/ko)/[nlog(n)] > 4, called the ultra-high dimensional case, it is 
almost impossible to estimate the support of /3. We will then consider that we are not in the 
ultra-high dimensional case. 

The family (X,)i<,<p is now a dependent family. As said at the beginning of Section [3] we 
assume that the decomposition of fi is unique, i.e. 31J c {1, ...,p]lii - Tjjej^jPj- We still have 
|7| = k(). The general procedure defined at the beginning of Section 3 remains the same; the first 
step orders the variables and the second step estimates ko. 

Procedure 'A' defined in Section lTTI when the variance is known and procedure 'B' defined in 
Section [372l when the variance is unknown are still applicable, but with some minor modifications. 
The first modification we have to make concerns the definition of the subspaces V(k) and S(k)Xr)- 
Indeed, we have to take into account that the family (X,)i<,<p is a dependent family. Let define 
a = dim(span(Xi, ...,Xp)), note that a < n , hence we set: 

< fl - 1, t„,ax - llog2(a - k - l)i and note V(k) - span(X(i), ...,X(st)) where Sk is defined by 
Sk = inf{.s/dim(span(X(i),...,X(,,)) = k] and S(km = span(X(ij+i), n V^^^ where qk,, 

is defined by qk,t = mf{q/dim{span{X(si,+\), ■■■,X(st+q))) = 2']. With these definitions, we have 
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dim(Vi() - k and dim{Sk^t) - 2'. 

The second modification we have to make concerns the construction of the respective upper 
bound Uj^^ and T^^t in Lemma [TTI and Lemma [331 Indeed, as p > n, these upper bounds are 
constructed as following: 

Let k e {0, ...,a - 2), e' ~ N„{0,cr^I„). The family (X,)i<,<p is ordered by <ti defined by: 
V y e {1, ijt), cri(j) = (j) and V y e {sk + I, p},Xo-^^j) is the variable that maximizes: 
{||nx,n<x.,,„„...,x.,„_,>-e'||2,V / G {1, ...,p}\{cri(l), cnO' - 1))). 

Once we have j such that dim(< Xcr,(i), ■■■,Xo-^(j) >) = a, the next ordered variables Xa-,(j+i) 
can be any remaining unordered variables in {1, p}\{o-i(l), ...,cri(y)) . We could complete 
(Ti by an arbitrary order on the remaining variables, but since we achieve to construct a family 
{Xo-,(i), ■■■,Xa-^(j)) that describes W, we do not care about the remaining unordered variables. 



We can then calculate the statistics: 



where S (k),a,(t) = span(X<^,(,,+i), n V^^.^ where r^y is defined by 

rk,, = inf{^/dim(span(X^,(,,+i), = 2') and y(i),tr,(,) = e V(k)- 



With those two modifications. Theorem 13.21 of Section I3.L1I and Theorem 13.61 of Section 
I3.2.1l applv assuming < a. A simulation study is given in the next section showing that our 
procedure 'B' performs well. 



4. Simulation study 

In this section, we comment the results of the simulation study which are presented in the 
[Appendix A| Our aim was to test the performances of our selection methods. Six methods were 
compared; the procedure described in Section |2] with ordered variables, denoted "pre-ordered" in 
the tables of Appendix A the two-step procedure 'B' described in Section [3] with non-ordered 



variables, either with ordered p-values denoted "procpval" or with the Bolasso order denoted 
"procbol", the FDR procedure described in Bunea et al. |6], the Lasso method and the Bolasso 
technique. For the purpose of comparison, we considered the design of the simulations of Bunea 
et al. [6J. The comparison of the first method and the others is unfair and was not performed 
because of prior information being available on the relative importance of the variables. The two 
kinds of method have to be compared separately. 

The simulation was performed in several frameworks: in the common case when (X,)i<,<p 
is a linearly independent family, in a more precise case (the orthonormal case), and in a more 
general case (the high-dimensional case). For the latter, the FDR procedure of Bunea et al. 0] 
cannot be computed as p-values can not be obtained with the least squares estimate with all p 
variables. In this case we compared an adjusted FDR (denoted FDR2); a p-value was calculated 
for each variable X, from the regression of Y onto the variable concerned. As mentioned in the 
introduction, this is a natural extension of the FDR procedure in high-dimensional analysis and 
extended FDR is currently widely used in biology for QTL research and transcriptome analysis. 

Concerning the design of our simulations, we simulated p independent vectors X* ~ N„(Q, /„), 
and set the predictors Xj = for j = 1, ...,p. The response variable Y was computed via 

Y = y6,-,^i, -I- ... -H Pi^^Xif.^ + e, where e is a vector of independent standard Gaussian variables, 
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{i\,..,iic„} - J c {!,.., p] and /3j € { ■\/n,6]. We considered two instances of ko (5 or 10). In 
each instance, samples of n = 100 and 500 in the low-dimensional case, and n = 100 in the 
high-dimensional case have been simulated. We let p to vary with the sample size n. In the 
orthonormal case, we set the predictors Xj, j - 1 as an orthonormal basis of span(Xj , .., X*) 
(principal component for example). When p > n, the number of non-zero coefficients, \J\, 
was checked. Indeed, we assume that the decomposition of p was unique, so we had to check 
the possibility that, even though non-zero coefficients were simulated, several other vari- 
ables might be included in span(X,|, ...,X,j^) because of coUinearity. All variables included in 
span(X,j , .. .,Xii_^) were lo oked for In all simulations described above and reported in the tables 
IA.HIA.3l in Appendix A there were no other variables in J than the one used to simulate Y. Thus 



the aim of all simulations remained the same, i.e the estimation of 7 = {i\ , .., ikj- 

When {Xi)i<i<p is not an orthonormal family, the calculation of Ti^^a with (|3]l demands a lot 
of computational time, as a calculation of and Qk.t,„„^ is needed for each k. Since a variable 
selection method is not only judged on its results but also on its fastness, useless calculations in 
our procedure had to be avoided. The Gram-Schmidt process was used to get an orthonormal 
family out of (X,)i<,<p. Thus the calculation of (V^)k>o was done once and for all. 
Decompose V Z > 0: 

Xk+I - ^Vt(Xk+l) + Uv^(Xk+l) 

Note (ey))=i.i an orthonormal basis of Vk, then: 

k k 

Uv,(Xk+i) = ^ < Xk+i,ej > ej and Uv^{Xk+i) = X^+i - ^ < Xk+i,ej > ej 



The family (Xi , .., Xp) was modified into 



^1, 



\\nvf(X2)\\ \\nv^(X3)\\ mv;JXp)\\ j 
We called that orthonormal family {Xi , ■■,Xp). Y had been decomposed as: 

Y = Xi^i + .. + Xk^k + + - + Xp$p +6 (22) 



ThenSk,t - spaniXk+i, ■■,^*:+2') and so ||ns,.,F||„ = jS^^j + ... +Pk+2i- ^^^^ technique avoided a lot 
of useless and redundant calculations. 

The decomposition of Gram-Schmidt has also been used in the non-ordered variables case with 
the two-step procedure 'A' and 'B' once the variables have been ordered. 

When {Xi)\<i<p is an orthonormal family, we used another upper bound of the statistics 
UDt,.Nu ill oi^r simulations than the one in Lemma [331 Indeed, we can obtain an upper bound 
which does not depend on the family (X,)i<,<p nor on the order on that family. 
Let /i, Ip be p i.i.d. standard Gaussian variables, and let > ... > 
We define: \fk = 0, ...,p- 1, VZ) = 0, ...,p- k - I, Lk,D = Z%k^D^i ^l) 

We have a stochastic upper bound of UduM, for all < ^ < /? and for all t e Tk'. under Hk and 
on the event Ak'. 

fj ^ Nk,t ^Dtj.p-k 

iJDi„Nt, 5 — — 7 — 

Ukj >^k,Dti + l^n-p 
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where K„-p is a chi-square variable with n - p degrees of freedom and Z^ /j is defined by ([13]) . 
The simulations were performed with this new upper bound. 

The results of the simulation study are presented in Table IA.HIA.3I The method displayed 
as FDR in the simulation tables corresponds to the procedure described in Bunea et al. |6], by 
choosing q (user level) as 0.1 and 0.05. 

The /' penalty of the Lasso was tuned via 10-cross validation. Concerning the Bolasso tech- 
nique, we choose it - 100 bootstrap iterations; the frequency threshold and the penalty were also 
tuned via 10-cross validation. 

Concerning the Bolasso for ordering, we chose to stop the dichotomy algorithm (see Section 
[3]l as soon as mm{p, 60) variables were ordered. The objective was to spare calculation because 
it is uneasy to distinguish the remaining variables after the min(/7, 60)th position. The dichotomy 
algorithm assumes that when a variable reaches a frequency of 1, the frequency stays at 1 when 
the penalty decreases. In practice this assumption might be wrong, the algorithm is then restarted. 

Concerning the three procedures presented in this paper, the results are displayed for a level 
a e {0.1,0.05). For the ordered case, (X,)i<,<p became {Xj,X-j) and the collection {at,t e Tk) 
was chosen in accordance to the procedure PI, which demanded more computational time than 
P2, but which was far much powerful. For the non-ordered case, the collection {a,, f e 71) was 
chosen with the procedure P5, when (X,)i<,<y, was not an orthonormal family, as the variance was 
considered unknown in the simulation. 

In all tables, the first row gives an estimation of6- P^iA'j^^. This estimation is not mentioned 
for the first procedure as the variables have akeady been ordered, so 5 = 0. In low dimension, the 
parameter m reflect the well-conditioned of the matrix X\ m - maxi<j<p mjj where imij)i<ij<p - 
(X^Xy^ /n, a low m means that the matrix X is well-conditioned. The second row "Truth" records 
the percentage of times the true model is selected; i.e. the pourcentage of time we actually found 
J - J. The third row, labelled "Inclusions", records the number of variables selected, average 
over 500 replications. "Correct inclusions" records the number of relevant variables that are 
included in the selected model, average too. The MSB (Mean Squared Error) is calculated by 
average over all simulations: MSE - 2'Li(?; - (X/3j)i)/n, where Y = X0, with an estimation 
of /3 with non zero values only on /. 

First, concerning the ordered case procedure. Tables lA.lllA.BI all show that this procedure 
gave excellent results, even in the high-dimensional case with a higher number of variables than 
the number of observations (Table IA.3I ). These results are not surprising because our choices 
of j6 verified condition (R) of Theorem 12.11 with a very small y, so the probability of wrongly 
estimating k{) was almost reduced to a. 

Concerning all the other methods tested, Table lA.ll shows that the FDR procedure performed 
slightly better in the orthonormal case when /3j was small. Table lA.2l shows results in the com- 
mon case when (X,)i<,<p was not an orthonormal family. Our procedure with the Bolasso order 
gave the best results, especially compared to the FDR procedure which gave weak results. 

Table I A. 3] focuses on the main aim of this paper, the high-dimensional case. We chose two 
alternatives for the number of variables, p = 300 and p = 600. The table shows that the FDR2 
was far from satisfactory. Indeed, nearly no true model were recovered in the 500 simulations. 
In fact. Table IA.3I shows that our "procbol" procedure outperformed the others when p » 
n. However, a combination of a small fij and a high number of variables induced a high 5 
and consequently decreased the power of our "procbol" method. Moreover, the results of the 
"procbol" method become less satisfactory with an increase on the value of because of the 
overestimation of the statistics in Lemma [331 
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5. Conclusion 

This paper tackled the problem of recovering the set of relevant variables / in a sparse lin- 
ear model, especially when the number of variables p was higher than the sample size n. We 
proposed three new methods based on hypotheses testing to estimate /: one when the variables 
were ordered and two when they were not; one if the variance is known and the other when the 
variance is unknown. The three procedures are proved to be powerful under some conditions on 
the data. The simulations showed that these new procedures outperformed all the other methods 
tested in a common case but also in the high-dimensional case, which was the aim of this study. 
For instance, a method commonly used in apphed sciences gave inaccurate results in simulation. 
Finally, a crucial point in these new methods remains the way to order variables. To improve 
the two-step procedure presented in this paper, a better way to order variables than the Bolasso 
technique needs to be found. 
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Appendix A. Simulation results 



Results 


pre-ordered 


procpval 


procbol 




FDR 


Lasso 


Bolasso 




a=OA 


a=0.05 


a=0.1 


Q'=0.05 


Qf=0.1 


Qr=0.05 


q=Q.l 


5=0.05 






ko = lO,n= m,p 


= 80,j8 = 


■\[n,m = 


0.01 




















5 


= 0.00 


5 = 


0.00 


6 


= 0.00 






Truth 


0.89 


0.95 


0.98 


0.99 


0.98 


0.99 


0.88 


0.95 


0.80 


0.75 


Inclusions 


10.58 


10.22 


10.06 


10.01 


10.05 


10.02 


10.16 


10.07 


10.93 


10.80 


Correct incl. 


10.00 


10.00 


10.00 


9.99 


10.00 


10.00 


10.00 


10.00 


10.00 


10.00 


MSE 


0.11 


0.11 


0.11 


0.10 


0.10 


0.10 


0.11 


0.11 


0.15 


0.15 


^ = 5, 


n = 100, p 


= 80,;8 = 


6,m = 0.01 




















6 


= 0.00 


s = 


0.01 


S 


= 0.01 






Truth 


0.89 


0.96 


0.75 


0.70 


0.80 


0.76 


0.81 


0.78 


0.73 


0.72 


Inclusions 


5.6 


5.19 


4.82 


4.68 


4.89 


4.78 


4.97 


4.80 


5.91 


5.82 


Correct incl. 


5.00 


5.00 


4.77 


4.67 


4.82 


4.74 


4.85 


4.74 


4.96 


4.99 


MSE 


0.06 


0.06 


0.13 


0.16 


0.11 


0.14 


0.11 


0.15 


0.12 


0.10 



Table A.l: The orthonormal case 



Results 


pre-ordered 


procpval 


procbol 




FDR 


Lasso 


Bolasso 




a=0.1 


a=0.05 


a=0.1 


a=0.05 


Qf=0.1 


Qr=0.05 


5=0.1 


5=0.05 






jto = 10,n = 100,/) 


= 80,/3 = 


Vn, m = 


0.102 




















6 


= 0.46 


6 = 


0.00 


6 


= 0.45 






Truth 


0.92 


0.96 


0.54 


0.54 


0.94 


0.96 


0.13 


0.10 


0.29 


0.67 


Inclusions 


10.33 


10.15 


12.06 


11.62 


10.08 


10.05 


7.55 


6.60 


12.18 


10.70 


Correct incl. 


10.00 


10.00 


9.92 


9.90 


10.00 


10.00 


7.34 


6.53 


10.00 


9.99 


MSE 


0.12 


0.11 


0.20 


0.22 


0.1 1 


0.11 


2.97 


3.72 


0.18 


0.14 


*o = 5, 


n= 100, p 


= 80,/J = 


6,m = 0.103 




















6 


= 0.88 


s = 


0.07 


S 


= 0.82 






Truth 


0.91 


0.95 


0.11 


0.11 


0.86 


0.84 


0.00 


0.00 


0.27 


0.47 


Inclusions 


5.37 


5.13 


6.30 


5.54 


5.00 


4.94 


0.98 


0.66 


7.22 


6.14 


Correct incl. 


5.00 


5.00 


4.05 


3.90 


4.91 


4.87 


0.86 


0.62 


4.94 


4.94 


MSE 


0.06 


0.06 


0.40 


0.44 


0.08 


0.09 


1.42 


1.45 


0.16 


0.13 


/c() = 10,(1 


= 500, /; = 


= 450,/j' = 


\fn. III - 


0.040 




















S 


= 0.02 


6 = 


0.00 


6 


= 0.01 






Truth 


0.91 


0.95 


0.98 


0.98 


0.94 


0.96 


0.84 


0.85 


0.88 


0.99 


Inclusions 


11.09 


10.32 


10.05 


10.05 


10.07 


10.04 


10.12 


9.99 


10.26 


10.01 


Correct incl. 


10.00 


10.00 


10.00 


10.00 


10.00 


10.00 


9.94 


9.90 


10.00 


10.00 


MSE 


0.02 


0.02 


0.02 


0.02 


0.02 


0.02 


0.30 


0.31 


0.02 


0.02 


kQ = 5,n = 500, p 


= 450, /J = 


6, m = 0.044 




















(S 


= 1.00 


6 = 


0.07 


6 


= 1.00 






Truth 


0.89 


0.95 


0.00 


0.00 


0.86 


0.84 


0.00 


0.00 


0.68 


0.27 


Inclusions 


7.35 


6.06 


2.19 


1.68 


4.95 


4.88 


0.09 


0.05 


5.37 


3.78 


Correct incl. 


5.00 


5.00 


1.22 


1.16 


4.90 


4.85 


0.07 


0.05 


4.91 


3.78 


MSE 


0.02 


0.01 


0.27 


0.28 


0.02 


0.02 


0.36 


0.36 


0.03 


0.09 



Table A.2: The non orthonormal case, p <n 
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Results 


pre 


ordered 




procpval 


procbol 


FDR2 


Lasso 


Bolasso 




a=0.l 


Q'=0.05 


(!'=0.1 


(!'=0.05 


a=0.\ 


Q'=0.05 


q=0.\ 


9=0.05 






ko = 


10, n = 


100, p = 


300,/?= V" 
























6 = 


1.00 


S = 


0.00 


d = 


1.00 






Truth 


0.91 


0.96 




0.00 


0.00 


0.99 


0.99 


0.00 


0.00 


0.60 


0.78 


Inclusions 


10.53 


10.14 




8.92 


8.68 


10.01 


10.01 


4.17 


3.38 


11.05 


10.46 


Correct incl. 


10.00 


10.00 




8.35 


8.36 


10.00 


10.00 


4.17 


3.38 


10.00 


10.00 


MSE 


0.11 


0.10 




1.56 


1.63 


0.10 


0.10 


5.21 


6.04 


0.15 


0.13 




= 5,n = 


100, p = 


300,^8 = 6 
























6 = 


0.65 


S = 


0.09 


S = 


0.60 






Truth 


0.93 


0.96 




0.33 


0.33 


0.79 


0.74 


0.03 


0.01 


0.38 


0.56 


Inclusions 


5.438 


5.16 




4.62 


4.50 


4.88 


4.78 


3.22 


2.74 


7.57 


6.32 


Correct incl. 


5.00 


5.00 




4.32 


4.24 


4.82 


4.74 


3.15 


2.71 


4.92 


4.90 


MSE 


0.06 


0.05 




0.27 


0.29 


0.11 


0.14 


0.66 


0.79 


0.18 


0.15 


ko = 


10, n = 


100, p = 


600, jS = V" 
























6 = 


1.00 


S = 


0.17 


d = 


1.00 






Truth 


0.89 


0.95 




0.00 


0.00 


0.83 


0.83 


0.00 


0.00 


0.00 


0.25 


Inclusions 


10.66 


10.21 




4.88 


4.36 


10.30 


10.20 


2.33 


2.02 


16.97 


12.24 


Correct incl. 


10.00 


10.00 




4.68 


4.23 


9.99 


9.99 


2.33 


2.02 


9.99 


9.99 


MSE 


0.12 


0.11 




4.11 


4.56 


0.11 


0.11 


6.34 


6.69 


0.31 


0.20 


ko 


= 5,n = 


100, p = 


600,/? = 6 
























.5 = 


0.95 


(5 = 


0.30 


S = 


0.92 






Truth 


0.912 


0.96 




0.05 


0.05 


0.62 


0.56 


0.00 


0.00 


0.11 


0.26 


Inclusions 


5.43 


5.12 




3.36 


3.22 


4.62 


4.48 


1.48 


1.18 


10.52 


7.49 


Correct incl. 


5.00 


5.00 




3.14 


3.04 


4.50 


4.39 


1.46 


1.17 


4.59 


4.65 


MSE 


0.06 


0.06 




0.59 


0.62 


0.22 


0.25 


1.10 


1.22 


0.37 


0.30 



Table A. 3: The high-dimensional case, p> n 



Appendix B. Proofs 

Proof of Theorem \2.1\ Let k < ko - 1 and assume that (Rk) holds. According to Baraud et al. 

the power of the test Hk, V^iTk^a > 0), is greater than 1 - y/ko. This is equivalent to 
'^iiiHk is accepted) < ylk^. 

Moreover, for all k > ko, ^^{Tk,a > 0) < c, since a is the level of the test Hk- 
Then we have: 

P^(^ > ^o) < F^(//t„ is rejected) = ^^,{Tk„a > 0) < a 

and 

P^(^ <ko) < ^ P^(/// is accepted) 

j=o 

< koy/ko. 

Hence we obtain 

¥^,(k + ko) < Vf,{k < ko) + ^^,ik >ko)<y + a 
which concludes the proof of (|5]l. 

□ 
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Proof of Corollarv \2.3\ Let p < n and ko < p such that p = An, where A < 1. We set a„ - 

jn - 1 In. We set Vfe, Vf, a, = — , thus 2,£7- cc/ < ctn- With these choices, we have that the 

log2(n) 

I I log2(n) 

conditions of Remark 12. 21 are verified. Indeed a, - > exp(-A^A_,/10) because k + 2 < 



Dk, + Lk, 2' + log(l/a,) 

p - An. Moreover -Vn - l/n > 2exn(-A^i. ,/21) and the ratio — '■ = < 

Nk,t n-k-2' 

An + log{lla,) . 

remains bounded. 

(1 - A)n 

With these conditions C2(k, t) and C^ik, t) behave hke constants, and thus 
for t - mm{\log2(p — A;)J, inf{f, 2' > ko]) the condition {Rk) is verified for all k < k^ - \ and n 
large enough: 



I l|2 cr- 

ni/^(yu) =Oand — 



C2(k,t)^ 2' log\ — \ + C3{k,t)log 



'2ko 

be applied and P^(^ ko) < y,, + a„. In particular, P^(/fe ko) — > 0. 



— > 0, thus Theorem l2.1l can 



□ 



Proof of Theorem \3.2\ Let k < ko. 

We use the identity V(fl, b) e R^, (a + bf > - b^. On the event Ai„: 
'it€l^{0,...,log2(ko-k)}: 

ms,,,M = \\ns,,Jfi + e)\\l 

1, 



> 2lins«.,„/^ll« - lins„„„fll« 

> hnf[\\nsp\tS eB2.]-\\ns^,,^„,e\\l 
where By = {span(X/),/ c J, \I\ = 2'}. Hence: 

P^Vf e /, ^Iin,,,„/||2 < Ul,'\a,) n A,„j 

= p|vf € /, ^\\ns,,jM + e)\\l < u[,'\a,) n A,„j 

< p(vf € /, ±^M[\\nsfi\tS e - ■^\\ns,,,,€\\l < Ul,'\a,) n A,„j 

We have on the event Ak^ and for k + 2' < ko that ||ns,j,|,,e||^ < sup {Ullsell,^, S € B2'}. Moreover, 

for S e By, l|nse||^ ~ ^xl- Note that IByl = ^2")- '^^^ '^^""'^ ^' " ''"^^^2"^''" ™d Z,(m) the 
probability for the statistic to be larger than u. We denote ;Pj(m) the probability for a chi-square 
with d degrees of freedom to be larger than u. We have an upper bound of the (1 - M)-quantile of 
the statistic Z,: Zj'^u) < X2^(u/\B2'\)/n. Indeed: 
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< y P(||n5e||2> — ;p-/(M/|B2.|)) 



< \B2'\ <u. 

\Bi,\ 



Therefore, the following condition 



{condk) : 3f G /, ^inf {lin^^uH^ S e Bo] > '-xl' j + Ul\ai) 

implies that: 

P|vf G /, ^inf {||ns//||2,5 e By} - ^lins„„„e||^ < t^"'(a,) n A,„j < ylk^. (B.l) 



rAoV 



Let us denote VO < c/. 



Gk,d = {span(X/),/ c {1, ..,/7)\{(l), |/| = d] . 



(B.2) 



Note that \Gk,d\ ~ ^ ^j- Thsn f^^, ^ sup |||nse||^,5 e Ga-,2'}. This inequality leads us to an 
upper bound of the (l-u)-quantile of t/],: iu) <X2^{ul\Gk,2'\)ln. 

Using (u) < x^}(u/\Gk,2'\)/n in the condition (condk), we obtain the condition (cond2,k) 
which still implies dB.lb : 

icond2,) 3t e I, ^mf {||n.,||i . e B,} >_ \ (20) (^) . 

Moreover, Laurent and Massart ifioll showed that for K ~ x^f- 

P (/:>£/ + 2 + 2x) < e-\ (B.3) 



Then fort/ - 2' and x^, - log ( ' ) we have;(f2/ | |^ j j ^ 2'+2 V2'x„+2x„. Since < j > 



Vm + V < Vm + for all M > 0, V > and since VI < m, Vm ^ we obtain: 
-«-2'/o,(|^) + /o,(^),thus 



^2' 



|B2'I 



< 2' 

< 2' 



2y2'log(ko/y) + logiko/y)] 



For d -2' and x„ = log{\Gic^2'\/o:t), we obtain: 

Xt i(^r/\Gk,2'\) < 2' 



l+2Jlog 



e(p - k) 



+ 2log 



e{p-k) 



■2[^j2'log{lla,) + log{lla,)\ 



< 2' 



5 + Alog 



p - k 



■2y2'log{l/a,) + log{l/a,)]. 



We also have an upper bound of l/ff,,Vf e Tk- Indeed , the construction of {a,,t € Tk) with 
the procedure P3 gives that F^lt eTk,Ul^> (a,)j - a. Thus "^t e Tk,a, > a/\Tk\, since 

^i^3ten,Ul>Ul^'\a/\m^<a. 



Hence we obtain 

--1 



X2, (a,/\Gk,2'\) < 2' 



5 + 4log 



p-k 



+ 2 



a I 



a I 



Using the inequality a^|u + b^J\' < ^la^ + yju + v which holds for any positive numbers 
fl, b, u, V, we finally get the condition {R2,k) which implies dB.ll) : 
(R2,k) ■ 3t e I such that 



20-2 
This leads to 

Hence 



10 + 4log 



(p - k)ko 
22t 



2'^Uog 



+ log 



ko\Tk\ 



Vf e /, — ||ns,,,„/||„^ < U{ , (a,) n A,J < y/ko. 



cr 



NteI,Uk,,<U^ 



Then, Vfe < ko, P (fcAhis = k n At,) < y/ko, where kAbis = |/|- 
We can calculate P^(/ J): 

P^(/ < F^i J ^ 7 n Ai„) + P(A^^) 

'ko-l 

< ^Xf^Abis = n Ak,) + P^Abh > k) n A^J 

< koyl ko + a + 6. 
And then (fT2l l is proved. 



□ 



Proof of Lemma \33\ Under and on the event A^ : 

Uk, = \\ns,,,,,Y\\l/cr^ = ms,,Jf^ + 6)11^/^2 ^ ||n^^^^^_^e||2/cr2 

The family (X,),- is orthonormal, thus: Uk,t - Z^lk'+i < ^'^(j) >n /c^^- 

As e ~ N„{0,cr^ln), we have VI < j < p,< e,Xj >~ N{0,cr^) and the variables < e,Xj >,j 
l,...,p ai-e i.i.d.. Thus {< e,XQ) >,j > ^) = {< e,X,„ >,m i J) - {crWi, crWp-t). 



So < e,Xa, >t /cr' < SjLi = Z,,z,,,/«. 



□ 
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Proof of Corollarv \3.4\ Let k < ko. 

0-2 is defined such that ^o-,(i)l < ■■■ < Iy6o-,(i:o)l' note e^+i) = ||n5|,,„e||, V; € {k+l, ...,k+2'} with fc+ 
2' < ^0. 

Similarly as in the proof of Theorem l3.2l using the equaUty inf || Ills /i| 1^,5 ^ ^2'} - Zij=i 
we get that: 



1 ..... ^Dl,.p-k(°'') 



Vf e /, —\\ns,„ j\\f, < 



r\Ai 



k+2'-l 



j=i j=i< 



'Z'r} licit) 



r\Ak 



On the event Ai.„, {< e,X()+i) >, k < j < k + 2' - 1} <z {< e,Xj >,j € J}, which implies that we 

have an stochastic upper bound: ' ^fj+i) - '^^^2',ko- 

Hence the following condition 



1 1 

implies that 



j=i 



k+2'-\ 



^ ^' 20-2 Z1^^2(j) „^2 ^0+1) - 

j=l j=k 



-1 



-k(»') 



n 



This leads to 



P |vf e /, ^||ns,,,„y||2 < Z^;,,,_,(a,) n j < jlh,. (B.4) 

Let 0<M< 1,0<D and d < D. In the following, we study the behavior of the (1 - m) quantile 
of the statistic Zj d in order to obtain a more explicit condition than (condi k). 

Let define Vd,D - {/ c {1, ...,D}/\I\ = d). Note that \Vd,D\ = |^|- Let recall that Z^o is de- 



fined by ( fT3l l as Z^/.o = VK^^^ where Wi,...,Wd are D i.i.d. standard Gaussian variables 
ordered as |W(i)| > ... > \W(p)\- We have that: Zjp < sup , / e ¥4,0)- Note that for 

We obtain that the (1 - M)-quantile of Zjd is lower thanj^^^"' {u/\Vdp\)'- 



' {Zd,D > Xd^' {ulWdA)) < P sup i 2 Wl V/ e Vd,D > Xcr' («/! Vj.dI) 

V i iel j 

< P^H'?>;fd-'(«/|yrf,fll) 



< u. 



Using the expression of the upper bound of ;f^'(M) from the proof of Theorem 13.21 we get the 
condition (Rihis.k) from an upper bound of the right part in the condition (condj,^/!). The end of 
the proof is the same as in the proof of Theorem l3.2l □ 
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Proof of Theorem l3l6l Let k < ko and < 7 < 1. Denote / = {0, ['0^2(^0 - ^)J)- 
From the proof of Theorem 13 .21 (more precisely the condition {condk)), we have that if the fol- 
lowing condition is verified: 

3t € //linf {||^5A/||„^5 e B^] > T,j(a,)ei_,/2,„^ + (^) (B-5) 

where Q\-u denote the (1 - M)-quantile of the statistics \\Y - ^v^t)U)'^\\^i under the event A^:^, 
then we have: 

Vf e /, ||ns„,„F||2 < T^j(a,)Q,_,;2,„^ n A,„j < jjlk,. 
Since P(Vf e /, Ud,,„Nu < %l{a,) n Ai„) < inf [^{Ud,,^,, < t^"j(a,) n Ai„)} and since 

^{Ud,.„n„<%]{c^,) r^Au,) < P(l|y-nv„„„y||^>Qi_,/2^„ nA,„) 

<y/2ko 



+J1 

< y/ko- 

we have that the condition (IB. 5b implies that 

P (Vf e /, UduMs < '^k!M>) n Ai„) < y/ko. (B.6) 

In the following, we give an upper bound of the right part in dB.Sb . For this doing, we have to 
give an upper bound of T~j(ci',) and Qi-y/2ko- 

Assume we are on the event and under H^, then 

S(t),.T](()-' 



Nkj\\ns,,.,,,Y\\l A^.,,l|n,,,,^„„e|| 



2 ■ 



As we are on the event A^^ and under H^, the space V^^) is not a random space. Thus for any 
subspaces S of dimension D^ t - 2', we have that ||ns7||^ = ||nse||^ ~ cr^x^l^ we have that 



|F-nv,,y-n,F||^ = W,s^y,,,,^et - ^'xU^.Jn. Hence j^^^^^y J^^y"^^y^^^ ~ 

A^^.rl|n,e||^ 
^Di,,||e-ny,j,+5 e||2' 



Thus on the event A^ and under Hk, fkj < sup \ — — ' ^ ^,5 e Gi:,2' ), where Gk,2' is 



defined by (iRlt . 

We deduce that the (1 - M)-quantile of Ta_, is lower that F^' {ul\Gk,2'\)- Indeed: 

P (Tf*,' > Pd\,m., < P (sup 5 g G,,2 j > F^:„^„ («/|Gu'l)) 



< \Gk,2-\-^<U. 

\Gk,v\ 
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Baraud et al. J^] gave an upper bound of f (m), for < D,0 < A' and < m: 



Since exp(M) - I < u exp(M) for any u > Q, -\/u + v < a/m + y/v for all m > 0, v > and since 
a, > a/\Tk\, we derive that: 



2%}{a,) < 2' 



1 + K^{k,t)log 



e(p - k) 



+ 2 



where Ki{k,t) = J\ + — , A2(/t,f) = (l +2— |m and K^{kJ) = 2Ki{k,t) + A2(/t,f) with 

(4Dk, le(p-k)\\ 
L, = log(|71|/a), nit = exp(4Lf/A^A.,f), = ex^\—-^log\ — - — II, M = 2mtmp. 

Since Vo^ + mb < a/2 + {m + 1 12)b holds for any positive numbers a, /?, m, we obtain that: 



2't,j(a,) < 2' 



1 + A|(;t,f) + A3(;t,f)/o^ 



e(p - k) 



+ (\+K2(k,t))log\^^\ (B.7) 



We have now to find an upper bound of Qi-yjika- 
Qx-ylik, is defined by P - nv,,„„F||2 > Qy-yi^k, n At„) < jl2h,. 

We always have that: ||F - nv„.,,,,F||2 < ||^||2 + ||e||2. Thus V < m < 1, the (1 - M)-quantile of 
\\Y - nv/„.,,„y||^ is lower than the (1 - M)-quantile of ||//||^ + ||e||^. 

As ||e||^ ~ cp-X^Jn, we can use the equation ( IB. 31 ) for x„ = log(2ko/y) and we obtain that 

;t-^'(r/2^o) < n + 2 -y/n^ + 2x„. 

Therefore 



'-y f'T ^ '\/ TTJC I '^JC^ 
Ql-y/2ko < Win + O- 



and as 1 + 2 Vm + 2m < 2 + 3m, we get 



fii-y/2*o ^ \Hl + (r^\2 + 



(B.8) 



(B.9) 



Combining ( IB.7l i ( IB.9I ) in ( IB.SI l and using that 
< 2' 



|B2'I 



5 + +2y2'log(2ko/j) + log(2ko/j)] 

6.*. (I) 



+ 3logi2ko/j) 



we obtain the following condition: 
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{R3,k) : 3t e I such that 
1, 



inf ms fiC S eB2, > 



\\l + CT^\2+-log 

n 



2ko 
7 



cr 
+ — 
n 

A(k,t) 
Nk,, 

+ — 
n 



\\l + (rH2 + 



where A(k, t) - 2' 



2+ — +A3(k,t)log 



e(p - k) 



+ {\+K2{k,t))log 



a 



The condition {Ri^k) leads to ( IB. 61 ) and thus 

P^(/ +J)< P^(/ ^ 7 n A,„) + nA^) 



\j=o 

< koy/ko + a + 6. 



+ F{Al) 



And then (fT9] l is proved. 



□ 



Proof of Remark \3y\ In the following, C(fl, /?) denote a constant depending on the parameters a 
and b. Under the assumption that 2' < (n - k)/2 and since V x >2, < 1 we have that; 



Dk,t , (p-k 
-log' 



Nk,, ° \ Dkj I n-k-2 



2' 



n — k 



2' 



n — k 



Dk,r Dk, (p-k 



Moreover the ratio Dk,,/Nkj is bounded by 1, thus loginip) < 4 — '- + 4 — -log 



< 12. 



Nk., Nk,, \ Dk,, 

As the ratio 4Lk,,/Nk,, is bounded by C'(a) and since M < 2exp(C'(a))exp(12), we have that M 
is bounded by C"(a). Thus Ai(k, t) < V2, A2(k, t) < 3C"(a) and Aiik, f) < 2 V2 + 3C"(a). 
We obtain under the condition log{p - k) > I that A(k, t) < 2'C(a)log{p - k). 



We also have that 



< C(||ju||„,7,cr) since log(kQ)/n < 1, 

< 2'C(y)log(ko). 



and that 2' + 4log (| jj + 3/o^ j < 2' [e + 4tog (| j + 3log j 
We finally obtain equation (l20b . 



□ 



Proof of Corollar\ \3.8\ The difl'erences between the two conditions (^3,i) and {Rms,k) lie in the 
fact that inf |||nsju||^,5 e B2'} = lijLijS^,(j) ™d that the upper bound of 2i-y/2*o is modified, 
where Qi-yi2k„ is defined by P - ny,,,,,„7||2 > Qi^yi2k, n A^„) < 7/2^0- 
Indeed, on the event A^^^ we have that ||F - nv,j||,)F||^ < YJj^k+i' Pa-iU) H^H"' where 0-2 is defined 
such that ^0-2(1)1 ^ ■■■ ^ I6o-2(yto)l- We get from there the condition (Ribis,k)- □ 
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